Testing Strategy
This document defines a comprehensive testing strategy for Agentic Browser. It covers unit testing, integration testing, API testing, and browser extension testing methodologies. It explains the frameworks and patterns used, mock strategies for external dependencies, and how to test AI agent behavior, tool execution, service integrations, and browser automation. It also documents MCP protocol communication, WebSocket functionality, and extension messaging. Guidance is included for asynchronous operations, browser automation scenarios, and external API interactions, along with examples of test implementation, CI setup, and automated workflows. Challenges specific to AI systems, browser automation, and multi-component architectures are addressed, alongside performance, security, and user acceptance testing approaches.
Agentic Browser comprises:
A Python MCP server that exposes tools and orchestrates LLM-based actions
A FastAPI-based HTTP API server
An agent runtime built on LangGraph and LangChain
A browser extension (React + Vite) with WebExtensions APIs and WebSocket client
Services and tools that integrate with external providers (Gmail, Calendar, GitHub, YouTube, PyJIIT, web search)
Configuration and environment management
mcp_server/server.py"] API["FastAPI Server
api/run.py"] CFG["Config & Env
core/config.py"] LLM["LLM Adapter
core/llm.py"] AG["React Agent
agents/react_agent.py"] RT["Agent Tools
agents/react_tools.py"] BRSVC["Browser Use Service
services/browser_use_service.py"] end subgraph "Extension" EXT["React Extension
extension/*"] WS["WebSocket Client
extension/entrypoints/utils/websocket-client.ts"] end subgraph "External" GAPI["Gmail API"] CGAPI["Calendar API"] GHAPI["GitHub API"] YTAPI["YouTube API"] WEB["Web Search / Websites"] end EXT --> WS WS --> MCP WS --> API MCP --> LLM MCP --> RT RT --> GAPI RT --> CGAPI RT --> GHAPI RT --> YTAPI RT --> WEB BRSVC --> LLM AG --> RT
Diagram sources
Section sources
MCP Server: Exposes tools (LLM generation, GitHub QA, website markdown fetch/convert) and routes tool calls to implementations. It runs via stdio and integrates with LangChain LLM clients.
FastAPI Server: Runs uvicorn and serves the HTTP API.
Config and Environment: Loads environment variables and sets logging levels.
LLM Adapter: Provider-agnostic LLM client factory supporting multiple providers and base URLs.
React Agent: LangGraph-based agent that decides when to use tools and executes them asynchronously.
Agent Tools: Structured tools for GitHub, web search, website QA, YouTube QA, Gmail, Calendar, PyJIIT, and browser action generation.
Browser Use Service: Generates JSON action plans from goals and DOM context using LLMs and sanitizes outputs.
Extension: React app with sidepanel, multi-session chat, and WebSocket client for real-time communication with the backend.
Section sources
The system is composed of:
CLI entrypoint selecting between API and MCP modes
MCP server exposing tools and invoking LLM adapters
Agent runtime orchestrating tool use and execution
Services and tools integrating with external APIs
Extension communicating with backend via WebSocket and UI
Diagram sources
MCP Server Testing#
Approach:
Unit tests for tool discovery and tool invocation handlers
Mock provider clients to isolate LLM behavior and external API calls
Test error propagation and invalid tool names
Validate input schemas and required fields
Frameworks and patterns:
Use pytest for unit tests
Use unittest.mock or pytest-mock for mocking provider clients and external services
Parameterized tests for supported providers and input variations
Mock strategies:
Replace provider clients with mocks that return deterministic responses
Stub external tool dependencies (e.g., GitHub markdown fetcher) to controlled fixtures
Simulate network errors and timeouts to validate error handling
Diagram sources
Section sources
LLM Adapter Testing#
Approach:
Unit tests for provider selection and parameter mapping
Tests for missing API keys and base URLs
Tests for model initialization failures and fallback behavior
Validation of default model selection and provider-specific overrides
Mock strategies:
Patch provider client constructors to avoid network calls
Inject environment variables for API keys and base URLs
Simulate provider-specific exceptions to validate error handling
Best practices:
Keep provider-specific logic isolated behind a configuration map
Validate inputs early and fail fast with clear error messages
Section sources
Agent Runtime Testing#
Approach:
Unit tests for message normalization and payload conversion
Integration tests for the compiled LangGraph workflow
Tests for tool binding and conditional edges
Async execution tests for tool calls and agent steps
Mock strategies:
Replace LLM client with a deterministic mock that returns fixed responses
Mock tool implementations to return controlled outputs
Use asyncio event loop controls to manage concurrency
Diagram sources
Section sources
Agent Tools Testing#
Approach:
Unit tests for each tool’s input schema validation
Integration tests for tool execution with mocked external services
Tests for optional credentials and default token/session handling
Tests for error handling and informative error messages
Mock strategies:
Replace external API calls with fixtures and controlled responses
Use partial application to inject default tokens/sessions
Simulate OAuth failures and rate limits
Diagram sources
Section sources
Browser Use Service Testing#
Approach:
Unit tests for prompt construction and LLM invocation
Tests for DOM info formatting and constraints handling
Tests for action plan sanitization and validation
Error handling for LLM failures and sanitizer issues
Mock strategies:
Mock the LLM chain to return deterministic outputs
Provide synthetic DOM structures and constraints
Validate sanitized outputs meet expected schema
Section sources
Extension Messaging and WebSocket Testing#
Approach:
Unit tests for WebSocket client initialization and connection handling
Integration tests simulating message exchange between extension and backend
Tests for UI components that depend on WebSocket state
Mock backend responses to validate UI rendering and error handling
Frameworks and patterns:
Use pytest for unit tests
Use pytest-asyncio for async WebSocket operations
Use mocking to simulate backend MCP/API responses
Section sources
API Testing#
Approach:
Unit tests for server startup and configuration loading
Integration tests for FastAPI endpoints (if present)
Tests for environment-driven configuration and logging
Frameworks and patterns:
Use pytest with FastAPI test client
Mock external dependencies during API tests
Section sources
Key dependencies and testing implications:
MCP server depends on LLM adapter and agent tools
Agent runtime depends on LLM adapter and tools
Browser use service depends on LLM adapter and sanitizer
Extension depends on WebSocket client and backend servers
Diagram sources
Section sources
Asynchronous tool execution: Ensure tests capture latency and concurrency behavior
LLM cost and rate limits: Use throttling and caching in tests; mock providers to avoid quota issues
Browser automation: Limit DOM size and action plan complexity; validate sanitization overhead
WebSocket throughput: Test message batching and reconnection logic
[No sources needed since this section provides general guidance]
Common issues and remedies:
Missing environment variables for API keys or base URLs: Validate configuration loading and provide clear error messages
Provider client initialization failures: Add fallbacks and logging; test with invalid configurations
Tool execution errors: Capture and propagate errors with context; validate input schemas
WebSocket connection drops: Implement retry logic and UI feedback; test disconnection/reconnection flows
Section sources
This testing strategy emphasizes isolation of external dependencies, deterministic mocking, and comprehensive coverage of asynchronous flows. By structuring tests around the MCP server, agent runtime, tools, services, and extension, teams can ensure reliable behavior across model providers, browser automation, and multi-component integrations.
[No sources needed since this section summarizes without analyzing specific files]
Testing Best Practices for Asynchronous Operations#
Use pytest-asyncio for async tests
Prefer deterministic mocks over real network calls
Test timeout and cancellation paths
Validate concurrency and resource cleanup
[No sources needed since this section provides general guidance]
Browser Automation Scenarios#
Simulate DOM structures and constraints
Validate action plan generation and sanitization
Test navigation, click, and type actions
Verify tab management commands
Section sources
External API Interactions#
Mock OAuth flows and API responses
Validate error propagation and user-friendly messages
Test optional credentials and default session handling
Section sources
Continuous Integration Setup#
Separate jobs for backend and extension
Backend job: run Python unit tests and integration tests
Extension job: run TypeScript checks and build verification
Cache dependencies and reuse virtual environments
[No sources needed since this section provides general guidance]
Automated Testing Workflows#
Pre-submit checks: lint, type checks, unit tests
Post-submit checks: integration tests against staging
Nightly smoke tests: end-to-end MCP and WebSocket flows
[No sources needed since this section provides general guidance]
Security Testing Approaches#
Input validation and sanitization for agent inputs and DOM structures
Authorization checks for tools requiring credentials
Audit logs for all tool invocations and browser actions
[No sources needed since this section provides general guidance]
User Acceptance Testing#
Define scenarios for agent workflows and browser automation
Validate UI rendering and user feedback for WebSocket status
Collect regression tests from real-world usage
[No sources needed since this section provides general guidance]